Supervised feature selection by clustering using conditional mutual information-based distances

نویسندگان

  • José Martínez Sotoca
  • Filiberto Pla
چکیده

In this paper, a supervised feature selection approach is presented, which is based on metric applied on continuous and discrete data representations. This method builds a dissimilarity space using information theoretic measures, in particular conditional mutual information between features with respect to a relevant variable that represents the class labels. Applying a hierarchical clustering, the algorithm searches for a compression of the information contained in the original set of features. The proposed technique is compared with other state of art methods also based on information measures. Eventually, several experiments are presented to show the effectiveness of the features selected from the point of view of classification accuracy. & 2010 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comments on supervised feature selection by clustering using conditional mutual information-based distances

Supervised feature selection is an important problem in pattern recognition. Of the many methods introduced, those based on the mutual information and conditional mutual information measures are among the most widely adopted approaches. In this article, we re-analyze an interesting paper on this topic recently published by Sotoca and Pla (Pattern Recognition, Vol. 43 Issue 6, June, 2010, p. 206...

متن کامل

Feature Selection in Regression Tasks Using Conditional Mutual Information

This paper presents a supervised feature selection method applied to regression problems. The selection method uses a Dissimilarity matrix originally developed for classification problems, whose applicability is extended here to regression and built using the conditional mutual information between features with respect to a continuous relevant variable that represents the regression function. A...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information

Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2010